S&P 500 Latest Stocks Analysis¶

=====================================

In this notebook, we will delve into the most recent 2019-2024 stocks from the index, focusing on:

Analysis Breakdown¶

  • 30%: Most Traded Average Stocks
    • Analyzing the most frequently traded stocks in recent years
  • 70%: Top 6 Tech Stocks
    • In-depth analysis of the top 6 tech stocks from the index, including:
      • Technical Analysis
      • Comparative Analysis
        • Overall and most similar stocks
      • Fundamental Analysis
      • Quantitative Analysis
        • Building a predictive model
In [1]:
import pandas as pd 
from datetime import datetime, timedelta
import warnings
import yfinance as yf
warnings.filterwarnings('ignore')
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta 
from scipy import stats
import matplotlib.dates as mdates
from matplotlib.colors import LinearSegmentedColormap
from plotly import tools
import plotly.tools as tls
import plotly.figure_factory as ff
import plotly.graph_objs as go
from plotly.subplots import make_subplots
from plotly.offline import init_notebook_mode, plot, iplot
import statsmodels.api as sm
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.stattools import adfuller
from sklearn.metrics import mean_squared_error
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
In [2]:
df1=pd.read_csv(r"C:\Users\PMLS\Downloads\sAND P TOP 5000 COAMPNIES NAMES\sp500_stocks.csv")
df1.head()
Out[2]:
Date Symbol Adj Close Close High Low Open Volume
0 2010-01-04 MMM 44.016724 69.414719 69.774246 69.122070 69.473244 3640265.0
1 2010-01-05 MMM 43.741020 68.979935 69.590302 68.311035 69.230766 3405012.0
2 2010-01-06 MMM 44.361343 69.958191 70.735786 69.824417 70.133781 6301126.0
3 2010-01-07 MMM 44.393166 70.008362 70.033447 68.662209 69.665550 5346240.0
4 2010-01-08 MMM 44.705982 70.501671 70.501671 69.648827 69.974915 4073337.0

Data Cleaning¶

In [3]:
df1['Date'] = pd.to_datetime(df1['Date'])

now_time = datetime.now()
five_years_ago = now_time - timedelta(days=5*365)

df = df1[df1['Date'] >= five_years_ago]
df = df.reset_index(drop=True)  
df
Out[3]:
Date Symbol Adj Close Close High Low Open Volume
0 2019-09-16 MMM 115.520706 141.864548 143.152176 140.392975 143.152176 2941323.0
1 2019-09-17 MMM 114.431313 140.526749 140.944809 139.214050 140.267563 3248814.0
2 2019-09-18 MMM 114.002403 140.000000 140.510040 138.168900 140.142136 2507653.0
3 2019-09-19 MMM 113.750465 139.690628 141.145493 139.556854 139.958191 1941108.0
4 2019-09-20 MMM 113.539391 139.431442 141.446487 139.197327 139.991638 4423884.0
... ... ... ... ... ... ... ... ...
629248 2024-08-28 ZTS 182.910004 182.910004 184.610001 182.100006 182.100006 1485400.0
629249 2024-08-29 ZTS 182.889999 182.889999 184.710007 182.479996 183.910004 1359900.0
629250 2024-08-30 ZTS 183.490005 183.490005 184.220001 180.979996 183.089996 2328500.0
629251 2024-09-03 ZTS 183.570007 183.570007 184.750000 182.460007 182.710007 1680900.0
629252 2024-09-04 ZTS 187.330002 187.330002 187.500000 181.585007 183.190002 1846912.0

629253 rows × 8 columns

In [ ]:
# Ensure 'Date' column is in datetime format
df['Date'] = pd.to_datetime(df['Date'])

# Define the start and end dates
start_date = pd.to_datetime('2019-09-09')
end_date = pd.to_datetime('2024-09-04')

# Generate a date range from start_date to end_date
full_date_range = pd.date_range(start=start_date, end=end_date)

# Extract the unique dates from the 'Date' column in df_last_5_years
available_dates = pd.to_datetime(df['Date'].unique())

# Find the missing dates by comparing the full range with available dates
missing_dates = full_date_range.difference(available_dates)

# Get the count of missing dates
missing_count = len(missing_dates)

# Print the count of missing dates
print(f"Count of Missing Dates: {missing_count}")

# Print each missing date in a human-readable format
for missing_date in missing_dates:
    print(f"{missing_date.strftime('%d %B %Y')} is missing")

Continue with our Work For now on¶

In [5]:
df=df.rename(columns={"Symbol":"ticks"})
new_df= df.dropna()
new_df.isnull().sum()
Out[5]:
Date         0
ticks        0
Adj Close    0
Close        0
High         0
Low          0
Open         0
Volume       0
dtype: int64
In [6]:
new_df
Out[6]:
Date ticks Adj Close Close High Low Open Volume
0 2019-09-16 MMM 115.520706 141.864548 143.152176 140.392975 143.152176 2941323.0
1 2019-09-17 MMM 114.431313 140.526749 140.944809 139.214050 140.267563 3248814.0
2 2019-09-18 MMM 114.002403 140.000000 140.510040 138.168900 140.142136 2507653.0
3 2019-09-19 MMM 113.750465 139.690628 141.145493 139.556854 139.958191 1941108.0
4 2019-09-20 MMM 113.539391 139.431442 141.446487 139.197327 139.991638 4423884.0
... ... ... ... ... ... ... ... ...
629248 2024-08-28 ZTS 182.910004 182.910004 184.610001 182.100006 182.100006 1485400.0
629249 2024-08-29 ZTS 182.889999 182.889999 184.710007 182.479996 183.910004 1359900.0
629250 2024-08-30 ZTS 183.490005 183.490005 184.220001 180.979996 183.089996 2328500.0
629251 2024-09-03 ZTS 183.570007 183.570007 184.750000 182.460007 182.710007 1680900.0
629252 2024-09-04 ZTS 187.330002 187.330002 187.500000 181.585007 183.190002 1846912.0

621845 rows × 8 columns

In [7]:
new_df.columns = new_df.columns.str.lower()
In [8]:
mean_dict = {}

# find average of volume traded over a period of time using for loops
for key in new_df['ticks'].unique():
    value = new_df[new_df['ticks'] == key ]['volume'].mean()
    mean_dict[key]= value

print("Length of the mean of ticks dictionary:", len(mean_dict))

# convert dict to pandas dataframe
avaerage_s = pd.Series(mean_dict)
top10_s = avaerage_s.sort_values(ascending=False)[:10]

print("Top 10 company tickers with highest average traded stock volume:\n", top10_s.index)
Length of the mean of ticks dictionary: 503
Top 10 company tickers with highest average traded stock volume:
 Index(['NVDA', 'TSLA', 'AAPL', 'AMZN', 'F', 'AMD', 'BAC', 'T', 'AAL', 'CCL'], dtype='object')
  • F is Ford Motor Company
  • T is AT&T Inc.
In [9]:
def subdataframe(df, tick):
    
    # top 10 ticks
    ticks = list(top10_s.index)
    
    assert tick in ticks, """Stock tick does not belong to top 10 stocks by trade volume, please try any of these:\n
    ['NVDA', 'TSLA', 'AAPL', 'AMZN', 'F', 'AMD', 'BAC', 'T', 'AAL', 'CCL']"""
    
    ndf = new_df[new_df['ticks'] == tick]
    return ndf
In [10]:
sp500Cmp_info_df = pd.read_csv('C:/Users/PMLS/Downloads/sAND P TOP 5000 COAMPNIES NAMES/sp500_companies.csv')
Top10HigestTradeNVD=['NVDA', 'TSLA', 'AAPL', 'AMZN', 'F', 'AMD', 'BAC', 'T', 'AAL', 'CCL']
info_df=pd.DataFrame({'tick':Top10HigestTradeNVD})
info_df
Out[10]:
tick
0 NVDA
1 TSLA
2 AAPL
3 AMZN
4 F
5 AMD
6 BAC
7 T
8 AAL
9 CCL
In [11]:
Final_info = pd.merge(info_df, sp500Cmp_info_df, how='left', left_on='tick', right_on='Symbol')
Final_info =Final_info[['tick','Shortname','Sector','Industry']]
Final_info
Out[11]:
tick Shortname Sector Industry
0 NVDA NVIDIA Corporation Technology Semiconductors
1 TSLA Tesla, Inc. Consumer Cyclical Auto Manufacturers
2 AAPL Apple Inc. Technology Consumer Electronics
3 AMZN Amazon.com, Inc. Consumer Cyclical Internet Retail
4 F Ford Motor Company Consumer Cyclical Auto Manufacturers
5 AMD Advanced Micro Devices, Inc. Technology Semiconductors
6 BAC Bank of America Corporation Financial Services Banks - Diversified
7 T AT&T Inc. Communication Services Telecom Services
8 AAL American Airlines Group, Inc. Industrials Airlines
9 CCL Carnival Corporation Consumer Cyclical Travel Services
In [12]:
# Update company dict for your top 10 companies
company_dict = {
    'NVDA': 'NVIDIA', 
    'TSLA': 'Tesla', 
    'AAPL': 'Apple', 
    'AMZN': 'Amazon', 
    'F': 'Ford Motor Company', 
    'AMD': 'Advanced Micro Devices', 
    'BAC': 'Bank of America', 
    'T': 'AT&T', 
    'AAL': 'American Airlines', 
    'CCL': 'Carnival Corporation'
}

# Create sub-dataframes for each company
nvda_df = subdataframe(new_df, 'NVDA')
tsla_df = subdataframe(new_df, 'TSLA')
aapl_df = subdataframe(new_df, 'AAPL')
amzn_df = subdataframe(new_df, 'AMZN')
f_df = subdataframe(new_df, 'F')
amd_df = subdataframe(new_df, 'AMD')
bac_df = subdataframe(new_df, 'BAC')
t_df = subdataframe(new_df, 'T')
aal_df = subdataframe(new_df, 'AAL')
ccl_df = subdataframe(new_df, 'CCL')

# Define the function to calculate daily return and company name
def dailyfunc(df):
    df['daily return'] = ((df['close'] - df['open']) / df['open']) * 100
    df.style.format('{:.2f}%', subset='daily return')
    df['daily_mean'] = (df['open'] + df['close'] + df['high'] + df['low']) / 4
    df['co_name'] = company_dict[df['ticks'].unique()[0]]
    return df

# Apply dailyfunc to each company's dataframe
nvda_df = dailyfunc(nvda_df)
tsla_df = dailyfunc(tsla_df)
aapl_df = dailyfunc(aapl_df)
amzn_df = dailyfunc(amzn_df)
f_df = dailyfunc(f_df)
amd_df = dailyfunc(amd_df)
bac_df = dailyfunc(bac_df)
t_df = dailyfunc(t_df)
aal_df = dailyfunc(aal_df)
ccl_df = dailyfunc(ccl_df)

# Print the start and end date for each company One more for Confioatnos 
print('\t\tStart Date\t\t\t\t\tEnd Date')
print(f"NVDA\t\t{nvda_df['date'].min()}\t\t\t{nvda_df['date'].max()}")
print(f"TSLA\t\t{tsla_df['date'].min()}\t\t\t{tsla_df['date'].max()}")
print(f"AAPL\t\t{aapl_df['date'].min()}\t\t\t{aapl_df['date'].max()}")
print(f"AMZN\t\t{amzn_df['date'].min()}\t\t\t{amzn_df['date'].max()}")
print(f"F\t\t{f_df['date'].min()}\t\t\t{f_df['date'].max()}")
print(f"AMD\t\t{amd_df['date'].min()}\t\t\t{amd_df['date'].max()}")
print(f"BAC\t\t{bac_df['date'].min()}\t\t\t{bac_df['date'].max()}")
print(f"T\t\t{t_df['date'].min()}\t\t\t{t_df['date'].max()}")
print(f"AAL\t\t{aal_df['date'].min()}\t\t\t{aal_df['date'].max()}")
print(f"CCL\t\t{ccl_df['date'].min()}\t\t\t{ccl_df['date'].max()}")
		Start Date					End Date
NVDA		2019-09-16 00:00:00			2024-09-04 00:00:00
TSLA		2019-09-16 00:00:00			2024-09-04 00:00:00
AAPL		2019-09-16 00:00:00			2024-09-04 00:00:00
AMZN		2019-09-16 00:00:00			2024-09-04 00:00:00
F		2019-09-16 00:00:00			2024-09-04 00:00:00
AMD		2019-09-16 00:00:00			2024-09-04 00:00:00
BAC		2019-09-16 00:00:00			2024-09-04 00:00:00
T		2019-09-16 00:00:00			2024-09-04 00:00:00
AAL		2019-09-16 00:00:00			2024-09-04 00:00:00
CCL		2019-09-16 00:00:00			2024-09-04 00:00:00
  • Here Our Start Data and End Data for all the Stocks are Correct , Ensuring Data Validation

Technical Analysis¶

  • Lets Start from Finding The All time High Price of Our Most Traded Stocks
In [13]:
def plot_closing_stock_prices(dfs, ncols=2):
    # Calculate the number of rows needed
    nrows = (len(dfs) + ncols - 1) // ncols
    
    # Create the figure and axes for the subplots
    fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(15, 5 * nrows), facecolor='#f4f4f4')
    fig.subplots_adjust(hspace=0.4, wspace=0.3)
    
    for ax, df in zip(axes.flat, dfs):
        # Calculate the highest stock price and the corresponding date
        high = df['close'].max()
        datetime = df[df['close'] == high]['date'].values[0]  

        # Define color based on the stock ticker
        if df['ticks'].unique()[0] in ['GE', 'F']:
            facecolor = '#ed615c'
        else:
            facecolor = '#4bd659'
        
        # Plot the data
        ax.plot(df['date'], df['close'], color='#0f2113')
        ax.set_title(f"{df['co_name'].unique()[0]} Stock Price", fontsize=16, fontweight='bold')
        ax.set_xlabel("Date", fontsize=12)
        ax.set_ylabel("Daily Closing Stock Price", fontsize=12)
        ax.set_facecolor('#ffffff')  # Set background color for each subplot
        
        # Add annotation for the highest stock price
        ax.annotate(
            f"All time high price during\nfive year period\nwas ${high:.2f}",
            xy=(datetime, high),
            xytext=(datetime, high - 0.1 * high),
            bbox=dict(boxstyle="round", facecolor='#f5d3bf', edgecolor='#d0d5db'),
            arrowprops=dict(facecolor='#f0190a', headlength=25, shrink=0.1)
        )

    # Hide any unused subplots
    for ax in axes.flat[len(dfs):]:
        ax.set_visible(False)

    plt.show()

# Call the function with your dataframes
plot_closing_stock_prices([nvda_df, tsla_df, aapl_df, amzn_df, f_df, amd_df, bac_df, t_df, aal_df, ccl_df])
No description has been provided for this image

Trade Volume Analysis¶

Look at the exact numbers of volume of how much these stocks were traded

  • Identify when the trade volume was highest and at what value
In [14]:
def create_trade_volume_grid_single_row(dataframes):
    # Set the number of columns to be equal to the number of dataframes
    cols = len(dataframes)

    # Create subplot grid with 1 row and as many columns as there are dataframes
    fig = make_subplots(rows=1, cols=cols, subplot_titles=[df['co_name'].unique()[0] for df in dataframes])

    for i, df in enumerate(dataframes, 1):
        # All plots will be on row 1, so just adjust the column
        row = 1
        col = i

        # Calculate statistics
        ave_y = df['volume'].mean()
        max_y = df['volume'].max()
        min_y = df['volume'].min()
        max_date = df[df['volume'] == max_y]['date'].values[0]

        # Add trace for volume
        fig.add_trace(
            go.Scatter(x=df['date'], y=df['volume'], name='Volume', line=dict(color='#00CED1')),
            row=row, col=col
        )

        # Add horizontal lines for max, min, and average
        fig.add_hline(y=max_y, line_dash="dash", line_color="red", row=row, col=col)
        fig.add_hline(y=min_y, line_dash="dash", line_color="green", row=row, col=col)
        fig.add_hline(y=ave_y, line_dash="dash", line_color="yellow", row=row, col=col)

        # Add vertical line for max volume date
        fig.add_vline(x=max_date, line_dash="dash", line_color="purple", row=row, col=col)

        # Update axes
        fig.update_xaxes(title_text="Date", row=row, col=col)
        fig.update_yaxes(title_text="Volume", row=row, col=col)

    # Update layout
    fig.update_layout(height=300, width=400*cols, title_text="Stock Trade Volumes in Single Row",
                      showlegend=False, template="plotly_dark")

    return fig

# Example usage
dataframes = [nvda_df, tsla_df, aapl_df, amzn_df, f_df, amd_df, bac_df, t_df, aal_df, ccl_df]
fig = create_trade_volume_grid_single_row(dataframes)
fig.show()
In [15]:
list_df = [nvda_df, tsla_df, aapl_df, amzn_df, f_df, amd_df, bac_df, t_df, aal_df, ccl_df]

# loop through the the list_df to find mini and maxi of each stocks 
mini = [df[df['date'] == df['date'].min()]['close'].values.item() for df in list_df]
maxi = [df[df['date'] == df['date'].max()]['close'].values.item() for df in list_df]

# find list of abosolute difference between both stock price
diff = np.array(maxi) - np.array(mini)

# find the percentage growth
growth = (diff/mini)*100
growth_list = growth.tolist()
co_name_list = [df['co_name'].unique()[0] for df in list_df]

# visualize the growth of the stocks
fig, ax = plt.subplots(figsize=(13,7))
ax.barh(y=co_name_list, width=growth_list, height=0.9, color=['#4bd659','#4bd659','#4bd659','#4bd659','#4bd659',
                                                             '#4bd659','#4bd659','#ed615c','#ed615c','#ed615c'],
       edgecolor='#713ae8')
for p in ax.patches:
    ax.annotate(f'{round(p.get_width(),2)}%', (p.get_width()+15, p.get_y() +0.3))
ax.set_xlabel('Percentage growth in stock price')
ax.set_ylabel('Name of companies')
ax.set_title("Growth in stock price over a period of 5 years")
plt.show()
No description has been provided for this image

Comparitive Analysis of of Top Tech Stocks in the Index¶

Now we are going to Move onto form our Most traded stocks To A comparitive Analysis of Top Stocks in the Tech Sectory ,

  • To do this , Firstly we filtred out Top 5 tech stocks from the Orignal Dataframe Software - Infrastructure industry
In [16]:
# function to return top 10 sub dataframe
def subdataframe(df, tick):
    
    # top 10 ticks
    ticks = list(top10_s.index)
    
    assert tick in ticks, """Stock tick does not belong to top 10 stocks by trade volume, please try any of these:\n
    ['NVDA', 'TSLA', 'AAPL', 'AMZN', 'F', 'AMD', 'BAC', 'T', 'AAL', 'CCL']"""
    
    ndf = new_df[new_df['ticks'] == tick]
    return ndf
In [17]:
ab = sp500Cmp_info_df[sp500Cmp_info_df['Industry'] == 'Software - Infrastructure']
ab=ab.sort_values(by='Marketcap',ascending=False)
ab=ab.head(6)
ab
Out[17]:
Exchange Symbol Shortname Longname Sector Industry Currentprice Marketcap Ebitda Revenuegrowth City State Country Fulltimeemployees Longbusinesssummary Weight
1 NMS MSFT Microsoft Corporation Microsoft Corporation Technology Software - Infrastructure 408.90 3039369887744 1.294330e+11 0.152 Redmond WA United States 228000.0 Microsoft Corporation develops and supports so... 0.059346
20 NYQ ORCL Oracle Corporation Oracle Corporation Technology Software - Infrastructure 140.75 387887300608 2.122700e+10 0.033 Austin TX United States 159000.0 Oracle Corporation offers products and service... 0.007574
28 NMS ADBE Adobe Inc. Adobe Inc. Technology Software - Infrastructure 575.25 255065849856 7.844000e+09 0.102 San Jose CA United States 29945.0 Adobe Inc., together with its subsidiaries, op... 0.004980
87 NMS PANW Palo Alto Networks, Inc. Palo Alto Networks, Inc. Technology Software - Infrastructure 346.15 112533364736 9.601666e+08 0.121 Santa Clara CA United States NaN Palo Alto Networks, Inc. provides cybersecurit... 0.002197
130 NMS SNPS Synopsys, Inc. Synopsys, Inc. Technology Software - Infrastructure 477.53 73355296768 1.652390e+09 0.127 Sunnyvale CA United States 20300.0 Synopsys, Inc. provides electronic design auto... 0.001432
149 NMS CRWD CrowdStrike Holdings, Inc. CrowdStrike Holdings, Inc. Technology Software - Infrastructure 259.32 63566077952 1.874900e+08 0.317 Austin TX United States 9219.0 CrowdStrike Holdings, Inc. provides cybersecur... 0.001241
In [18]:
company_dict = {
    'MSFT': 'Microsoft',
    'ORCL': 'Oracle Corporation',
    'ADBE': 'Adobe Inc.',
    'PANW': 'Palo Alto Networks',
    'SNPS': 'Synopsys Inc.',
    'CRWD': 'CrowdStrike Holdings'
}
In [19]:
def subdataframe(df, tick, *ticks):
    ticks_list = list(ticks)
    assert tick in ticks_list, f"Stock tick does not belong to the provided list of ticks: {ticks_list}"
    ndf = df[df['ticks'] == tick]
    return ndf

TechStocks = ['MSFT', 'ORCL', 'ADBE', 'PANW', 'SNPS', 'CRWD']

# Create individual DataFrames for each stock
MSFT_df = subdataframe(new_df, 'MSFT', *TechStocks)
ORCL_df = subdataframe(new_df, 'ORCL', *TechStocks)
ADBE_df = subdataframe(new_df, 'ADBE', *TechStocks)
PANW_df = subdataframe(new_df, 'PANW', *TechStocks)
SNPS_df = subdataframe(new_df, 'SNPS', *TechStocks)
CRWD_df = subdataframe(new_df, 'CRWD', *TechStocks)

# Apply the dailyfunc function to each DataFrame
MSFT_df = dailyfunc(MSFT_df)
ORCL_df = dailyfunc(ORCL_df)
ADBE_df = dailyfunc(ADBE_df)
PANW_df = dailyfunc(PANW_df)
SNPS_df = dailyfunc(SNPS_df)
CRWD_df = dailyfunc(CRWD_df)

Mean Price of Each Stock¶

In [20]:
background_gradient = LinearSegmentedColormap.from_list("", ["#212121", "#1A1D23", "#03055B"])
line_colors = ['#FFFF00', '#00BFFF', '#32CD32', '#FF00FF', '#FFA500', '#d62728']  

# Create the figure and axis with dark background
fig, ax = plt.subplots(figsize=(14, 7))
fig.patch.set_facecolor('#1A1D23')  # Set figure background color to match theme
ax.set_facecolor('#212121')  # Set axis background color

# Plot each stock's daily mean price with distinct colors and line widths for visibility on dark theme
ax.plot(MSFT_df['date'], MSFT_df['daily_mean'], label='Microsoft (MSFT)', color=line_colors[0], lw=1.4)
ax.plot(ORCL_df['date'], ORCL_df['daily_mean'], label='Oracle (ORCL)', color=line_colors[1], lw=1.4)
ax.plot(ADBE_df['date'], ADBE_df['daily_mean'], label='Adobe (ADBE)', color=line_colors[2], lw=1.4)
ax.plot(PANW_df['date'], PANW_df['daily_mean'], label='Palo Alto Networks (PANW)', color=line_colors[3], lw=1.4)
ax.plot(SNPS_df['date'], SNPS_df['daily_mean'], label='Synopsys (SNPS)', color=line_colors[4], lw=1.4)
ax.plot(CRWD_df['date'], CRWD_df['daily_mean'], label='CrowdStrike (CRWD)', color=line_colors[5], lw=1.4)

# Format the x-axis to display years only
ax.xaxis.set_major_locator(mdates.YearLocator())  # Place tick at the start of each year
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))  # Format to show only the year
plt.xticks(rotation=0, color='white')  # Make tick labels white for dark background

# Add title and labels with improved formatting
ax.set_title("Comparative Analysis of Tech Stock Prices Based on Mean Price", fontsize=16, fontweight='bold', color='white')
ax.set_ylabel("Daily Average Stock Price", fontsize=14, color='white')

# Add legend with a clear background
ax.legend(facecolor='#1A1D23', fontsize="medium", title="Tech Stocks", title_fontsize=13, labelcolor='white', edgecolor='white')

# Add gridlines for better readability, change to a softer color to match the dark theme
ax.grid(True, color='#444444')

# Show the plot
plt.tight_layout()
plt.show()
No description has been provided for this image

Daily Mean Foruma = (Open + Close + High + Low) / 4

Volitatliy Analysis¶

In [21]:
dataframes = [MSFT_df, ORCL_df, ADBE_df, PANW_df, SNPS_df, CRWD_df]
labels = ['MSFT', 'ORCL', 'ADBE', 'PANW', 'SNPS', 'CRWD']

# Set up the plot
plt.figure(figsize=(14, 8))

# Loop through each DataFrame to plot only KDE (no histograms)
for df, label in zip(dataframes, labels):
    # Plot only KDE for each DataFrame's 'daily return' with a specific color palette for differentiation
    sns.kdeplot(df['daily return'], label=label, linewidth=2)

# Add labels and title
plt.title('KDE of Daily Returns for Top 6 Stocks')
plt.xlabel('Daily Return')
plt.ylabel('Density')
plt.legend(title='Stock')

# Show the plot
plt.show()
No description has been provided for this image

This Standardized chart shows the Variations , Here CRWD seems to the most Volatile amoung them ,

  • There is a better way to shwo this with the Box and Whisker Plot , so we could get the idea of the Outliers better
In [22]:
dataframes = [MSFT_df, ORCL_df, ADBE_df, PANW_df, SNPS_df, CRWD_df]
labels = ['MSFT', 'ORCL', 'ADBE', 'PANW', 'SNPS', 'CRWD']

# Create a combined DataFrame for plotting
combined_df = pd.DataFrame()

# Loop through dataframes and calculate standard deviation of 'daily return'
std_devs = {}
for df, label in zip(dataframes, labels):
    # Create a temporary DataFrame with a 'Stock' column
    temp_df = df[['daily return']].copy()
    temp_df['Stock'] = label
    
    # Append to the combined DataFrame
    combined_df = pd.concat([combined_df, temp_df])
    
    # Calculate the standard deviation for this stock and store it
    std_devs[label] = temp_df['daily return'].std()

# Sort stocks by their standard deviation (volatility)
sorted_labels = sorted(std_devs, key=std_devs.get)

# Set up the plot
plt.figure(figsize=(12, 6))

# Create the box plot, sorted by volatility
sns.boxplot(x='Stock', y='daily return', data=combined_df, order=sorted_labels, palette="Set2")

# Add labels and title
plt.title('Box Plot of Daily Returns for Top 6 Stocks (Sorted by Volatility)')
plt.ylabel('Daily Return')

# Show the plot
plt.show()
No description has been provided for this image

Risk Analysis¶

How much value do we put at risk by investing in a particular stock?

In [23]:
means = []
std_devs = []

# Loop through dataframes and calculate mean and standard deviation for each stock
for df in dataframes:
    mean_return = df['daily return'].mean()
    std_dev = df['daily return'].std()
    means.append(mean_return)
    std_devs.append(std_dev)

# Define size and color for scatter plot
area = np.pi * 20  # Size of the scatter plot marker
colors = ['blue', 'green', 'red', 'purple', 'orange', 'brown']  # Colors for each stock

# Create a scatter plot
plt.figure(figsize=(12, 8))

# Plot each stock with a different color and annotate
for mean, std, label, color in zip(means, std_devs, labels, colors):
    plt.scatter(mean, std, s=area, c=color, label=label)
    plt.annotate(label, xy=(mean, std), xytext=(50, 50), textcoords='offset points',
                 ha='right', va='bottom', 
                 arrowprops=dict(arrowstyle='-', color=color, connectionstyle='arc3,rad=-0.3'))

# Add labels, title, and legend
plt.xlabel('Expected Return (Mean)')
plt.ylabel('Risk (Standard Deviation)')
plt.title('Scatter Plot of Risk vs Expected Return for Top 6 Stocks')

# Show the plot
plt.show()
No description has been provided for this image
In [24]:
SNPS_df
Out[24]:
date ticks adj close close high low open volume daily return daily_mean co_name
540432 2019-09-16 SNPS 133.240005 133.240005 134.149994 132.529999 133.619995 1150600.0 -0.284381 133.384998 Synopsys Inc.
540433 2019-09-17 SNPS 136.699997 136.699997 136.880005 132.509995 133.050003 1028300.0 2.743325 134.785000 Synopsys Inc.
540434 2019-09-18 SNPS 136.500000 136.500000 137.070007 133.639999 136.660004 978700.0 -0.117082 135.967503 Synopsys Inc.
540435 2019-09-19 SNPS 137.130005 137.130005 138.729996 136.460007 136.669998 777300.0 0.336582 137.247501 Synopsys Inc.
540436 2019-09-20 SNPS 135.720001 135.720001 137.990005 135.630005 137.110001 1196700.0 -1.013784 136.612503 Synopsys Inc.
... ... ... ... ... ... ... ... ... ... ... ...
541678 2024-08-28 SNPS 509.980011 509.980011 523.799988 506.329987 522.909973 1082800.0 -2.472694 515.754990 Synopsys Inc.
541679 2024-08-29 SNPS 517.020020 517.020020 526.429993 513.429993 514.919983 932400.0 0.407837 517.949997 Synopsys Inc.
541680 2024-08-30 SNPS 519.580017 519.580017 529.000000 514.440002 525.530029 1233500.0 -1.132193 522.137512 Synopsys Inc.
541681 2024-09-03 SNPS 481.220001 481.220001 514.400024 479.089996 514.150024 2561000.0 -6.404750 497.215012 Synopsys Inc.
541682 2024-09-04 SNPS 477.529999 477.529999 483.399994 472.600006 475.658508 1892187.0 0.393453 477.297127 Synopsys Inc.

1251 rows × 11 columns

Moving Average¶

A very basic Measure to smothen out the Noise is using moving averages

  • it's speciality Lies in using it's Golden Cross and Death Cross features in it

So we are going to use comapative comaises and comapre them with each other now

In [25]:
PANW_df
Out[25]:
date ticks adj close close high low open volume daily return daily_mean co_name
454113 2019-09-16 PANW 71.933334 71.933334 72.436668 69.243332 69.339996 4389000.0 3.740032 70.738333 Palo Alto Networks
454114 2019-09-17 PANW 70.896667 70.896667 71.823334 69.796669 71.496666 3570000.0 -0.839198 71.003334 Palo Alto Networks
454115 2019-09-18 PANW 70.063332 70.063332 71.000000 69.139999 70.883331 4560000.0 -1.156830 70.271666 Palo Alto Networks
454116 2019-09-19 PANW 69.629997 69.629997 70.266670 69.063332 69.720001 4064400.0 -0.129093 69.670000 Palo Alto Networks
454117 2019-09-20 PANW 69.416664 69.416664 70.860001 69.059998 69.666664 2958300.0 -0.358852 69.750832 Palo Alto Networks
... ... ... ... ... ... ... ... ... ... ... ...
455359 2024-08-28 PANW 350.440002 350.440002 352.779999 346.079987 349.470001 1653300.0 0.277564 349.692497 Palo Alto Networks
455360 2024-08-29 PANW 358.209991 358.209991 366.049988 352.220001 352.920013 2728500.0 1.498917 357.349998 Palo Alto Networks
455361 2024-08-30 PANW 362.720001 362.720001 362.970001 354.570007 358.420013 2988600.0 1.199706 359.670006 Palo Alto Networks
455362 2024-09-03 PANW 355.070007 355.070007 365.500000 352.950012 359.549988 2613800.0 -1.245997 358.267502 Palo Alto Networks
455363 2024-09-04 PANW 346.149994 346.149994 348.040009 338.089996 343.299988 3013721.0 0.830179 343.894997 Palo Alto Networks

1251 rows × 11 columns

In [26]:
dataframes = [PANW_df, CRWD_df]
labels = ['PANW', 'CRWD']

PANW_df['date'] = pd.to_datetime(PANW_df['date'])
CRWD_df['date'] = pd.to_datetime(CRWD_df['date'])
PANW_df.set_index('date', inplace=True)
CRWD_df.set_index('date', inplace=True)
In [ ]:
 
In [27]:
short_window = 50
long_window = 200

# Loop through companies and calculate moving averages
for company, label in zip([PANW_df, CRWD_df], ['PANW', 'CRWD']):
    company['Short MA'] = company['adj close'].rolling(window=short_window).mean()
    company['Long MA'] = company['adj close'].rolling(window=long_window).mean()
    
    # Detect Golden Cross and Death Cross
    company['Signal'] = 0.0  # Initialize signal column
    company['Signal'] = np.where(company['Short MA'] > company['Long MA'], 1.0, 0.0)
    company['Cross'] = company['Signal'].diff()

# Function to plot stock with Golden Cross and Death Cross
def plot_stock_with_cross(company, name, ax):
    ax.plot(company['adj close'], label='Adjusted Close', alpha=0.6)
    ax.plot(company['Short MA'], label=f'{short_window}-day MA', alpha=0.7)
    ax.plot(company['Long MA'], label=f'{long_window}-day MA', alpha=0.7)
    
    # Mark Golden Cross (1) and Death Cross (-1)
    ax.plot(company[company['Cross'] == 1].index, company['Short MA'][company['Cross'] == 1], '^', markersize=10, color='g', lw=0, label='Golden Cross')
    ax.plot(company[company['Cross'] == -1].index, company['Short MA'][company['Cross'] == -1], 'v', markersize=10, color='r', lw=0, label='Death Cross')
    
    ax.set_title(name)
    ax.legend()

    # Set major ticks to every year (from 2019 to 2024)
    ax.xaxis.set_major_locator(mdates.YearLocator())  # Every year
    ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))  # Format as 'Year'
    
    # Ensure x-axis spans from 2019 to 2024
    ax.set_xlim([pd.Timestamp('2019-01-01'), pd.Timestamp('2024-12-31')])

# Set up plot with subplots for PANW and CRWD
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(15, 6))

# Plot for PANW and CRWD
plot_stock_with_cross(PANW_df, 'PANW', axes[0])
plot_stock_with_cross(CRWD_df, 'CRWD', axes[1])

plt.tight_layout()
plt.show()
No description has been provided for this image
  • CrowdStrike had faced 2 Death cross one of them is in Year 2024 , I think we have in idea why and when it did that happen

-- We will Analyze the most Recent Event that could Have caused the strike

In [28]:
check = CRWD_df[CRWD_df['Cross'] == -1]
check
Out[28]:
ticks adj close close high low open volume daily return daily_mean co_name Short MA Long MA Signal Cross
date
2022-01-04 CRWD 189.190002 189.190002 197.445007 183.320007 195.889999 6503400.0 -3.420285 191.461254 CrowdStrike Holdings 235.3460 236.554925 0.0 -1.0
2024-09-03 CRWD 266.600006 266.600006 277.510010 265.290009 275.779999 5485800.0 -3.328738 271.295006 CrowdStrike Holdings 299.4642 300.198974 0.0 -1.0
  • It looks like , Our Data is working as Crowdstrike did indeed had a death cross ,
  • Full Article

image-2.png

Market and Equity Analysis of Stocks¶

In [29]:
symbols = ['GOOGL', 'MSFT','WMT','AMZN','BABA','ORCL','SAP','CRM']

holders_data = []
for symbol in symbols:
    stock = yf.Ticker(symbol)
    try:
        institutional_holders = stock.institutional_holders
        holders_data.append({
            'Stock Symbol': symbol,
            'Top 10 Institutional Holders': institutional_holders.head(10)
        })
    except:
        holders_data.append({
            'Stock Symbol': symbol,
            'Top 10 Institutional Holders': None
        })

Holders_data_df = pd.DataFrame(holders_data)
Holders_data_df.set_index('Stock Symbol', inplace=True)
Holders_data_df
Out[29]:
Top 10 Institutional Holders
Stock Symbol
GOOGL Date Reported Ho...
MSFT Date Reported Ho...
WMT Date Reported Ho...
AMZN Date Reported Ho...
BABA Date Reported H...
ORCL Date Reported ...
SAP Date Reported Hold...
CRM Date Reported Holder...

Equity ownership analysis¶

By Top 10 Institutional StakeHolders

In [30]:
from plotly.subplots import make_subplots

def create_interactive_holdings_pie_chart(df):
    # Create a 4x2 subplot grid
    fig = make_subplots(rows=4, cols=2, subplot_titles=df.index, specs=[[{'type':'domain'}, {'type':'domain'}]] * 4)

    for i, (symbol, data) in enumerate(df.iterrows(), 1):
        row = (i - 1) // 2 + 1
        col = (i - 1) % 2 + 1

        if data['Top 10 Institutional Holders'] is not None:
            holders = data['Top 10 Institutional Holders']
            
            # Create a pie chart for each stock
            fig.add_trace(
                go.Pie(
                    labels=holders['Holder'],
                    values=holders['Shares'],
                    name=symbol,
                    textposition='inside',
                    textinfo='percent+label',
                    hoverinfo='label+value+percent',
                    textfont_size=8
                ),
                row=row, col=col
            )

    # Update layout
    fig.update_layout(
        height=2000, 
        width=1400,
        title_text="Top 10 Institutional Holders for Various Stocks",
        showlegend=False
    )

    return fig

# Assuming Holders_data_df is your DataFrame with the data
# Create the plot
fig = create_interactive_holdings_pie_chart(Holders_data_df)

# Show the plot
fig.show()
  • it's quite intresting to note , How these Big investment firms Owns , Very Similar Shares in Both Tech Giants

Next we are Going to do some Analysis , Selecting do a Side by Side Comparision on Some Direct Alternates of Each Other as¶

S&P Index Comp alternate 1 alternate 2 Stock Symbols
Google (Alphabet) Microsoft GOOG/GOOGL, MSFT
Walmart Amazon Alibaba WMT, AMZN, BABA
Oracle SAP Salesforce ORCL, SAP, CRM
In [31]:
symbols = ['GOOGL', 'MSFT','WMT','AMZN','BABA','ORCL','SAP','CRM']

Historic EPS and PE Ratio¶

Obtaining current EPS and PE ratio data from YFinance is straightforward, but finding a way to get it for the last 12 months has been a significant challenge in my research.

  • Either the sources are paid, such as Quandl, Zack, or Alpha Vantage API
  • Or, web scraping is not allowed, leading to forbidden errors

My Approach¶

  • Initially, I planned to scrape the data from the Macrotrends Website using pages like this
  • However, scraping is not allowed, and I encountered a forbidden error , if You Know how to tackle the Fobidden warning , thean you case use the similar Code Below
  • As a worst-case scenario, you can save the tables in a text file and read them manually to extract the data
import pandas as pd

STOCK_URLS = {
    'AAPL': 'https://www.macrotrends.net/stocks/charts/AAPL/apple/pe-ratio',
    'AMD': 'https://www.macrotrends.net/stocks/charts/AMD/amd/pe-ratio',
    'NVDA': 'https://www.macrotrends.net/stocks/charts/NVDA/nvidia/pe-ratio',
    'MSFT': 'https://www.macrotrends.net/stocks/charts/MSFT/microsoft/pe-ratio'
}

# Initialize empty DataFrames
dfs = {stock: pd.DataFrame() for stock in STOCK_URLS.keys()}

# Loop through each stock and URL
for stock, url in STOCK_URLS.items():
    data = pd.read_html(url, skiprows=1)
    df = pd.DataFrame(data[0])
    df = df.columns.to_frame().T.append(df, ignore_index=True)
    df.columns = range(len(df.columns))
    df = df[1:]
    df = df.rename(columns={0: 'Date', 1: 'Price', 2:'EPS', 3:'PE ratio'})
    df['EPS'][1:] = ''
    df.set_index('Date', inplace=True)
    df = df.sort_index()
    df['trend'] = ''
    df['PE ratio'] = df['PE ratio'].astype(float)

In [32]:
file_paths = {
    'AMD': r"C:\Users\PMLS\Desktop\MicroTrends _Stock_info\AMD PE Ratio 2010-2024.txt",
    'Apple': r"C:\Users\PMLS\Desktop\MicroTrends _Stock_info\Apple PE Ratio 2010-2024.txt",
    'Microsoft': r"C:\Users\PMLS\Desktop\MicroTrends _Stock_info\Microsoft PE Ratio 2010-2024.txt",
    'NVIDIA': r"C:\Users\PMLS\Desktop\MicroTrends _Stock_info\NVIDIA PE Ratio 2010-2024.txt"
}

def clean_eps(value):
    if pd.isna(value) or value == '':
        return None
    return float(value.replace('$', ''))

dfs = {}

for stock, file_path in file_paths.items():
    df = pd.read_csv(file_path, sep=r'\s+', header=None, names=['Date', 'Price', 'EPS', 'PE ratio'])
    
    df['EPS'] = df['EPS'].apply(clean_eps)
    
    df['Price'] = pd.to_numeric(df['Price'], errors='coerce')
    df['PE ratio'] = pd.to_numeric(df['PE ratio'], errors='coerce')
    
    df['Date'] = pd.to_datetime(df['Date'])
    df.set_index('Date', inplace=True)
    
    df = df.iloc[1:]
    
    df.sort_index(inplace=True, ascending=False)
    
    dfs[stock] = df
In [33]:
# Extract the EPS columns from the DataFrames
eps_values = [df['EPS'] for df in dfs.values()]

fig = go.Figure()

# Add lines for each stock
for stock, eps in zip(dfs.keys(), eps_values):
    fig.add_trace(go.Scatter(x=eps.index, y=eps.values, name=stock, line=dict(width=2)))

# Set title and labels
fig.update_layout(
    title='EPS Over Time',
    xaxis_title='Date',
    yaxis_title='EPS',
    plot_bgcolor='#333333',
    paper_bgcolor='#333333',
    font_color='#ffffff'
)

fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False)

# Set vibrant line colors
fig.update_traces(marker=dict(color=['#ff69b4', '#33cc33', '#6666ff', '#ffcc00']))

# Show the legend
fig.update_layout(showlegend=True)

fig.update_layout(
    width=600,  # Set the figure width (in pixels)
    height=300,  # Set the figure height (in pixels)
)

# Show the plot
fig.show()
  • Microsoft Apple Continues
In [34]:
pe_ratios = [df['PE ratio'] for df in dfs.values()]
stock_symbols = list(dfs.keys())

# Create the figure
fig = go.Figure()

# Add lines for all stocks, using the stock symbols from the dictionary keys
for i, stock in enumerate(stock_symbols):
    fig.add_trace(go.Scatter(x=pe_ratios[i].index, y=pe_ratios[i].values, name=stock, line=dict(width=2), visible=(i < 2)))

# Set title and labels
fig.update_layout(
    title='PE Ratio Over Time',
    xaxis_title='Date',
    yaxis_title='PE Ratio',
    plot_bgcolor='#333333',
    paper_bgcolor='#333333',
    font_color='#ffffff'
)

# Set vibrant line colors
fig.update_traces(marker=dict(color=['#ff69b4', '#33cc33', '#6666ff', '#ffcc00']))
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False)
# Add dropdown menu
buttons = [
    dict(label=f'{stock_symbols[0]} & {stock_symbols[3]}', method='update', args=[{'visible': [True, False, False, True]}]),
    dict(label=f'{stock_symbols[1]} & {stock_symbols[2]}', method='update', args=[{'visible': [False, True, True, False]}]),
    dict(label='All 4', method='update', args=[{'visible': [True, True, True, True]}])
]
fig.update_layout(updatemenus=[dict(buttons=buttons, direction='down', showactive=True)])

# Show the legend
fig.update_layout(showlegend=True)

# Show the plot
fig.show()

PE Ratio Insights¶

The ideal PE Ratio is difficult to define in the tech industry, but on average, it falls between 15-16. For mature companies, it can be larger, ranging from 20-25. [1]

From the chart we can see that

  • AMD: In September 2023, AMD had a remarkably All time high PE Ratio of 856.83, accompanied by significant volatility.
  • Nvidia: Nvidia maintained a relatively less variable PE Ratio, which, although still larger than the average, showed more stability.

[1] Source: What is a Good PE Ratio?

In [ ]: